The Annals of Statistics 

2010, Vol. 38, No. 2, 1010-1033 

DOI: 10.1214/09-AOS732 

@ Institute of Mathematical Statistics, 2010 

OPTIMAL AND FAST DETECTION OF SPATIAL CLUSTERS 
WITH SCAN STATISTICS^ 

By Guenther Walther 

Stanford University 

We consider the detection of multivariate spatial clusters in the 
Bernoulli model with A'' locations, where the design distribution has 
weakly dependent marginals. The locations are scanned with a rect- 
angular window with sides parallel to the axes and with varying sizes 
and aspect ratios. Multivariate scan statistics pose a statistical prob- 
lem due to the multiple testing over many scan windows, as well 
as a computational problem because statistics have to be evaluated 
on many windows. This paper introduces methodology that leads to 
both statistically optimal inference and computationally efficient al- 
gorithms. The main difference to the traditional calibration of scan 
statistics is the concept of grouping scan windows according to their 
sizes, and then applying different critical values to different groups. It 
is shown that this calibration of the scan statistic results in optimal 
inference for spatial clusters on both small scales and on large scales, 
as well as in the case where the cluster lives on one of the marginals. 
Methodology is introduced that allows for an efficient approximation 
of the set of all rectangles while still guaranteeing the statistical op- 
timality results described above. It is shown that the resulting scan 
statistic has a computational complexity that is almost linear in A^. 

1. Introduction and overview of results. Spatial scan statistics are used 
to detect clusters in spatial data and are widely used, for example, in epi- 
demiology, biosurveillance and astronomy. In this paper, we consider the 
Bernoulli model in R^, which is important in many of the above applica- 
tions, see, for example, Kulldorff (1999). All of the results in this paper can 
easily be extended to higher dimensions, but focusing on the important two- 
dimensional case simplifies the exposition and the notation. The Bernoulli 
model states that there are N locations in R^, and that each location has a 
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label associated with it that takes on one of two possible outcomes, say and 
1. Conditional on the locations, the values of these labels are realizations of 
independent Bernoulli random variables with parameter p at all locations in 
a certain set R and parameter q at all locations in R^. 
The null hypothesis is 

Ho:p = q 

and the alternative hypothesis is Hi -.q <p for some unknown set R out of a 
certain class of sets, which in this paper is taken to be the class of rectangles 
with sides parallel to the axes and with arbitrary sizes and aspect ratios. 

Example 1. Each location represents the spatial location of a person 
which is either healthy (label 0) or diseased (label 1). 11 q < p, then R 
represents the local area of a disease outbreak. The task is to detect such 
disease clusters R where the disease density is significantly higher than the 
population density; see, for example, Kulldorff (1999) and the references 
given there. 

Example 2. A flow cytometer measures various numerical character- 
istics of each of a large number of cells. Thus, each cell can be identified 
with a point in Euclidean space. One task in the analysis flow cytometry 
data is to describe local regions where two cell distributions are different. 
Roederer and Hardy (2001) and Roederer et al. (2001) model such regions 
as rectangles with sides parallel to the axes. The reason for this is that such 
sets are easy to interpret and easy to implement on instruments for further 
processing. Also, sometimes the difference effect lives on a lower-dimensional 
subspace, which is a special case of axis-parallel rectangles. Suppose one has 
a sample drawn i.i.d. from a distribution G, and a second sample drawn 
i.i.d. from a distribution H . Label each observation in the first sample with 
a and each observation in the second sample with a 1. 11 G = H , then the 
problem is left invariant under permutations of the labels. It will be seen 
shortly that thus the inference for Example 2 can proceed identically to that 
for Example 1. 

Further examples are given, for example, in Kulldorff (1999). There is a 
large body of work on univariate scan statistics; see, for example, the refer- 
ences in Glaz and Balakrishnan (1999) and in Glaz, Naus and Wallenstein 
(2001), but the multivariate case is much less well developed. One reason 
is that computational issues play a prominent role when multivariate scan 
windows need to be evaluated at possibly many locations. Some references 
for the multivariate case that are relevant for the problem studied here are 
Naus (1965), Loader (1991), Chen and Glaz (1996), Aim (1997), Anderson 
and Titterington (1997), Kulldorff (1997, 1999), Naiman and Priebe (2001), 
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and on the computational aspect, Neill and Moore (2004a, 2004b). A recent 
reference for the multivariate two-sample problem is Rohde (2009), who con- 
structs regions of significant difference based on nearest-neighbor statistics. 

To simplify the exposition, we will assume that the distribution F of the 
locations is continuous and has independent marginals. All of the main re- 
sults continue to hold if the marginals are weakly dependent, for example, 
if they are '(/'-mixing. But a completely general design of the locations re- 
quires some modifications, which will be reported elsewhere. It is also conve- 
nient to only consider rectangles R with F{R) < 1/8, which is an innocuous 
restriction for most problems. 

In our analysis, we will condition on the sample size A^ and on the A^ 
locations. Under the null hypothesis, the problem is then left invariant under 
permutations of the labels, and exact finite sample significance statements 
for rectangles R can be obtained by a permutation test. There are two 
major problems associated with such an inference: as the class of rectangles 
is large, a statistical problem arises in the form of multiple testing, and 
a computational problem arises due to the need to evaluate test statistics 
on many rectangles. This paper introduces methodology that leads to both 
statistically optimal inference and computationally efficient algorithms. 

The conventional definition of a scan statistic is 

(1) maxT(R), 

where 7?. is a set of scan windows such as the set of rectangles described 
above, and T is a standardized test statistic that is evaluated locally for 
each scan window. Critical values are then derived for this overall maximum. 
In this paper, we propose to use size-dependent critical values obtained by 
grouping windows according to their size: all windows that contain between 
^jjjj 2~^N locations are grouped into one block, i > 3. Then we 
use different critical values for different blocks as proposed by Rufibach and 
Walther (2009) in a certain univariate context. The heuristic motivation for 
this approach is the following: there are of the order A^ disjoint windows 
containing a small number of locations each. As the corresponding local 
statistics T will be roughly independent, the maximum over the small win- 
dows will behave like the maximum of A^ i.i.d. random variables. This will 
tend to be stochastically much larger than the maximum over windows of 
size A^/8 (say), which will roughly behave like the maximum of 8 of these 
i.i.d. random variables. Thus, the distribution of the conventional scan statis- 
tic (1) will be dominated by the small windows, with a corresponding loss 
of power for larger windows. Grouping windows according to their size and 
employing size-dependent critical values allows to remedy this effect. Indeed, 
it will be shown below that this methodology allows for the following large 
sample results: 
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• If the effect q<p lives on a small rectangle, then the blocked scan statistic 
is essentially as powerful as any test can possibly be. 

• If the effect q<p lives on a large rectangle, then the blocked scan statistic 
is again optimal, even in comparison to tests that are allowed to use 
a priori knowledge of the correct window size. That is, scanning with 
different window sizes does not result in a significant penalty. 

• If the effect q < p lives on one of the two marginals, then the above 
optimality results still hold in the one-dimensional framework. That is, 
scanning with two-dimensional rectangles does not result in a significant 
penalty even if it is known a priori that the effect lives on a univariate 
marginal. 

We will give heuristic explanations of these results as well as rigorous 
mathematical statements. These results use a concentration inequality for 
the hypergeometric distribution which may be of independent interest. The 
optimality results require the use of size-dependent critical values, and it 
appears that such methodology has not been used before for scan statistics. 
In the setting of univariate function estimation in the Gaussian White Noise 
model, Diimbgen and Spokoiny (2001) employed a scale-dependent penalty 
term. It is not clear how a useful penalty term can be derived for the prob- 
lem under consideration here. Also, the univariate results in Rufibach and 
Walther (2009) suggest that the block procedure yields a better finite sample 
performance for relevant sample sizes. 

The construction of efficient algorithms, and also the particular proof 
of the above optimality results, requires an economical approximation of 
the set of all rectangles. We prove an approximation theorem that allows 
for an adequate approximation of the set of all rectangles by O(A^log^A^) 
rectangles. By comparison, there are 0{N^) rectangles that contain different 
subsets of the N locations. As a consequence, it will be shown that the 
blocked scan statistic can be implemented with a computational complexity 
that is almost linear in N. 

It will be seen that there is a close connection between the computational 
approximation scheme and the statistical inference, with the grouping of 
rectangles according to their size being a central theme in each case. 

2. The blocked scan statistic. Kulldorff (1997) derives the log-likelihood 
ratio statistic for a given scan window R as 
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Fig. 1. Constructing an approximating set of rectangles. Units on the axes are with 
respect to the marginal distributions Fx and Fy ■ 

\iq<p, and T{R) = 0, otherwise. Here, n := is the number of locations in 
R, p is the overall proportion of I's, and p and q are the proportion of I's in R 
and in i?^, respectively. Despite its cumbersome form, this statistic has been 
widely adapted in the computer science literature; see, for example, Neill 
and Moore (2004a, 2004b). The concentration inequality given in Theorem 
4 shows that this transformation of q and p has the benefit of a clean tail 
behavior. 

T[R) is zero if p = q and positive if q < p. We restrict ourselves to the 
alternative hypothesis q <p for notational simplicity and because this case 
is most relevant for applications, but all the following results continue to 
hold for the alternative p^ q after a simple modification of the definition of 
T{R). 

We will evaluate T on a set of rectangles that is a good approximation to 
the set TZ- := {axis-parallel rectangles in R^}. The following theorem has a 
constructive proof that shows how to construct an economical set of rectan- 
gles that approximate all rectangles in TZ- whose size (as measured in terms 
of F) is about s, namely n{s) := {R € 71: s/2 < F{R) < s}. 

Theorem 1. For every s,e £ (0,1), there exists 72.app(s,e) C TZ such 
that: 

1. For every R G 1Z{s), there exists R' G 72.app(s,e) with F{RAR') < eF{R). 

2. 7^/:7?.app(s, e) < Cs~^e~'^log{2/s) for a universal constant C. 

The idea for the approximation scheme is depicted in Figure 1 and ex- 
plained in Section 3. To construct an approximation for all of 72- we proceed 
as follows. First, note that 7?. = IJ^q 7?.(2~^). Second, the construction of 
TZ^appiSjS) depends on F, which is typically unknown. To obtain an approx- 
imation that depends on the observations only, we replace F by the empir- 
ical measure F/v in the construction of 7?.app(s,e) and call the resulting set 
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'^£lpp,n{s,s)- Then we define our approximating set as 

Llog2{Af/(21ogJV))J 

(2) n^pp,N-= U 7^app,iv(2-^ri/2)^ 

The particular choice e = i~^^'^ yields the optimality results given in Theo- 
rem 2 below. Thus, the smaller the rectangle, the finer the approximation 
relative to the size of the rectangle. Section 4 gives an algorithm that con- 
structs 7?.app,Ar. The precise result about this approximation is as follows. 



Corollary 1. There exists 'R-app,N CTZ- depending only on such 
that: 

1. For every R £lZ with F{R) G [ ^''^^ , |], there exists R' £ 72.app,Ar with 

R' C R and F(R A R') < | , ^^^^ with probability converqinn to 

^ ' - ^ ^\\og^{l/F{R))\ 

1 uniformly in R and F. 

2. i^TZapp.N ^ C'Nlog^ N for a universal constant C . 

By comparison, a naive enumeration of all rectangles that contain differ- 
ent subsets of the locations results in 0{N'^) rectangles and therefore is 
generally computationally infeasible. The algorithm in Section 4 computes 
the scan statistic T over 72.app,7v ™ O(A^log^A^) steps, i.e., with a compu- 
tation time that is almost linear in A^. 

We will call 72.app,Ar(2~^, . . .) the £th block of rectangles. The idea for the 
statistical methodology is closely connected to this approximation scheme: 
as all the rectangles in the £th block have about the same size, we will 
assign to those rectangles the same critical value. Following the criterion 
given in Rufibach and Walther (2009), we set these critical values such that 
the significance level of the £th block decreases as 

In more detail, let a S (0, 1) and define qe{a) to be the (1 — Q;)-quantile of 
max/jgfth h\oc\iT{R) when the labels are permuted randomly. For notational 
convenience, we suppress the dependence of qi{oi) on the sample size N and 
on p. Let a be the largest number such that 

/Llog2(7V/(21og7V))J ^~\^\ 

\ U {Ha\K„cJ<'^> 

By construction, one can then claim with guaranteed simultaneous finite 
sample confidence 1 — a that Hq is violated on every rectangle R on which 
T{R) > q^{a/i'^), where I is the block index of R. As explained in Rufibach 
and Walther (2009), it is advantageous in practice to replace £^ by, for 
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example, (10 + and all of the following results also apply for such a 
modification. 

The qi can be readily simulated with a simple extension of the usual 
Monte Carlo technique for a permutation test: for each block, one records in 
a list the maximum for each Monte Carlo permutation of the labels. Then 
one can use a bisection method on the lists of sorted maxima to find a. 



3. Optimality. In the following, we consider a growing sample size N . 
The result below allows for a quite general situation where the rectangle Rn 
may vary with A^, likewise the probabilities of success pN in and in 
i?^, and the design distribution . For simplicity, we denote probabilities 
under this model by Pat. The key quantity for detecting q < p on some 
rectangle R turns out to be 

D{F{R),p, q) := F{R){1 - 

D{F{R),p,q) increases, and hence the detection of R becomes easier, if for 
fixed F{R) and q the difference p — q increases, as one would expect. If F{R) 
and p — q are fixed, then D{F{R),p,q) increases as {p + q)/2 moves away 
from 1/2, i.e., detection is easier if the background probability q is closer to 
or if p is closer to 1. Theorem 2 below quantifies when detection is possible 
and shows that the blocked scan statistic is optimal for detecting both small 
rectangles, that is, when F^{R]\f) — )• 0, and large rectangles, that is, when 
liminf F^(i?/vf) > 0. An appropriate way to formulate these optimality re- 
sults is via the asymptotic minimax framework, see, e.g., the investigation 
of univariate shape properties on small scales in the Gaussian white noise 
model in Diimbgen and Spokoiny (2001) and the results for small and large 
scales in the context of a univariate density in Diimbgen and Walther (2008). 



Theorem 2. (a) Let {{F'^ ,Rj\f,p]\f,q]\f)} be an arbitrary sequence of 
parameters with 

D{F-iR.),p.,q.) > (2 + ..)MV^^ 



Then 

^^[the blocked scan statistic finds a significant rectangle R C Rn) — ^ 1- 

(bl) Let (pN be any sequence of tests with asymptotic level a £ (0, 1) un- 
der Hq. For any prescribed sequence of continuous distributions {F^}, there 
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exists a sequence of parameters {{Rn,Pn,Qn)} such that 

D\b [nN),PN,qN) > - £n) 



with En 10, £Af Y^log j,jv('j:; ~y ~^ OQ> and lim7vPAr('^Ar rejects)<a. 

This result continues to hold if one also prescribes the values {F'^ [Rj\i)} 
and {en}, provided that {logN)'^/N < F^{Rn) — ^ and eat. /log 



oo. 

(b2) Let {F^} be any sequence of continuous distributions and {Rn} o-ny 
sequence of rectangles with F^ {RN)il- (Rn)) > 0, letbN e [Q,NF^ {Rn){1- 

(R^))) , and let (f)^ be any test with asymptotic level a € (0, 1) under the 
null hypothesis that the probability of success on R^ equals that on Rj^. If 

^Ni'pN rejects) — )• 1 

for every sequence of parameters {{pn,(1n)} that satisfies D{F'^ {RN),pN,qN) > 
TV 



then necessarily 6^? — t- oo. 



Parts (a) and (bl) show that in the case of small rectangles there is a 
cutoff at L> = 2 log pwfji^) /N: \i D>{2 + eN) log pTr^/N with sat ^ 
sufficiently slowly, then the blocked scan statistic will detect the rectangle 
Rn with asymptotic power one. One the other hand, if D is of the size 
(2 — £jv)log pN^^^-^ / ^ 1 then no test can exist that detects the rectangle 
with nontrivial asymptotic power. These two statements leave essentially no 
room for any other test to beat the blocked scan statistic for detecting small 
rectangles. 

In the case of large rectangles, part (b2) states that any test 4)n can 
have asymptotic power 1 only if ND — )■ oo. But under the latter condition, 
the blocked scan statistic also has asymptotic power 1 [because ND — )• oo 
arbitrarily slowly is sufficient for the claim in (a) if ^^^^^^ stays bounded]. 

Note that (b2) even allows the competing test (j)]\f to possess prior knowledge 
of the rectangle R. 

These results clarify the tradeoff when using a scan statistic with varying 
window size. On the one hand, one can evidently gain substantial power by 
matching the window size with the extent of the effect. On the other hand, 
varying the window size incurs a multiple testing penalty. The above results 
show that this multiple testing penalty becomes negligible for large samples, 
provided one employs an appropriate calibration of the various window sizes 
such as the blocked scan statistic introduced here. An illustration will be 
given in Section 4. 

Furthermore, there is no substantial multiple testing penalty for searching 
over multivariate rectangles when the effect lives on one of the marginals. 
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Theorem 3. Suppose that the {Rn} are in fact intervals on one of the 
two axes. Then the conclusions of Theorem 2 continue to hold, even if the 
tests (pN in (bl) and (b2) are allowed to use the prior knowledge about which 
axis the {Rn} ^ive on. 

A heuristic explanation of this result is as follows: Figure 1(a) depicts 
^ disjoint rectangles with content s. The rectangles of Figure 1(b) are ob- 
tained by doubling the width of certain rectangles in Figure 1(a) and then 
dividing the rectangle into two with a horizontal split. After log - iterations 
one obtains the rectangles of Figure 1(c). The idea of Theorem 1 is that in 
the case of independent or weakly dependent marginals the totality of these 
rectangles (after a refinement allowing, e.g., certain translations) constitutes 
an economical set of rectangles that approximates well the set of all rectan- 
gles with content s. The difficulty of the multiple testing problem depends 
essentially on the cardinality of this approximating set, as local statistics 
that pertain to rectangles with large overlap will be highly correlated and 
thus will not affect the multiple testing problem much. But the construction 
depicted in Figure 1 results in ^ log ^ rectangles, which up to the log term is 
of the same order as the - rectangles in the univariate case of Figure 1 (a) . 
Thus, one expects that the multiple testing problem in this multivariate 
situation will not be significantly more difficult than in the univariate case. 

The proof of Theorem 2 makes use of the following concentration inequal- 
ity for the hypergeometric distribution. 

Theorem 4. Let X denote the number of red items among n items 
drawn without replacement out of N items of which R are red. Then 

¥{X >x)< C{L{x) + 2) exp(-L(x)) forx>m := nR/N, 

P(X <x)< C{L{x) + 2) exp(-L(x)) forx<m 

and 

F{L{X)>x)<2C{x + 2)exp{-x) forx>0, 

where L{x) := n{p\og | + (1 - p) log jEfl + i^- n){q\og | + (1 - g) log 
p:=§,p:=^,q:=§E^, and C := 2exp{,^(i + ^)}. 

This inequality compares to the classical concentration bound obtained 
from the Chernoff-Hoeffding theorem as follows. Hoeffding [(1963), Theo- 
rem 1 and Section 6] gives F{X > x) < exp(— ?i(j51og = + {I — p) log ^^)). A 
Taylor series expansion shows that for p near p the exponent behaves like 

~"- 2|(i-p) ' whereas -L{x) -"- (i_„/^)2^(i_p) ■ Thus, Theorem 4 accounts 
for the variance correction factor for sampling without replacement. 
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Derbeko, El-Yaniv and Meir (2004) give an improvement to the Chernoff- 
Hoeffding bound that is weaker than that of Theorem 4. Rohde (2009) gives 
a Bernstein-type inequahty for the hypergeometric distribution. 

Scanning on a grid and comparison with the algorithm of Neill and Moore. 
Neih and Moore (2004a, 2004b) give an algorithm that runs in 
steps for data that are binned on a \/iV x \/iV grid. That algorithm produces 
a rectangle R that attains max/j T{R) by partitioning the grid into overlap- 
ping regions, bounding maxT over subregions, and pruning regions which 
cannot contain the maximum. Thus, both the algorithm of Neill and Moore 
and the algorithm introduced here run in almost linear time; see Proposi- 
tion 1 below. Both algorithms achieve this by using an approximation. The 
algorithm of Neill and Moore approximates the data by binning them on a 
grid, and then finds the exact maximum over all rectangles on the grid. In 
contrast, the methodology introduced here evaluates rectangles on the exact 
data, but approximates the set of all rectangles. The results of this section 
show that this algorithm results in a solution that is statistically optimal. 
It is an open problem whether the algorithm of Neill and Moore results in a 
solution that is statistically optimal, and how the grid has to be constructed 
to achieve this. 

Consider now the related problem where one observes a Bernoulli random 
variable on each grid point of a \/]V x -v/iV grid. Then the design distribution 
F has independent marginals, but it is not continuous any more. Still, the 
methodology introduced in this paper can be readily adapted to this set-up 
and shown to be statistically optimal. The conclusions of Theorems 2 and 
3 continue to hold. However, the condition on the size of F{R) in Theorem 
2(bl) now has to be set differently for the marginal effects considered in 
Theorem 3. In the multivariate case, (bl) allows rectangles R as small as 
F{R) > log^N/N [and (a) even allows detection if F{R) > 21og7V/iV], which 
results in a detection threshold of about 2\og N/N for rectangles with these 
sizes. But any nonempty marginal interval R necessarily satisfies F{R) > 
^/N /N due to the nature of the grid, which results in a detection threshold 
of about \ogN/N for the smallest detectable marginal intervals. 

Controlling max/jr(i?). The cardinality of the approximating set of 
rectangles is small enough so that the tail behavior of max/jgfth block ^'(-R) 
can be controlled quite precisely by simply using Boole's inequality; see (10) 
below. This is in contrast to the approximating set of intervals introduced 
by Rufibach and Walther (2009) for certain multiple testing problems on 
the line. While that set leads to computationally efficient algorithms, its 
cardinality is still so large that the control of max/g^h \AodiT{I) requires in 
addition the difficult stochastic control of the increments of T{I) as a pro- 
cess in /. In light of the above results, one may surmise that for these and 
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related problems, there typically exists an approximating set that not only 
allows for statistical optimality and computationally efficient algorithms, 
but which also obviates the need for the stochastic control of the increments 
of T when used in conjunction with the block procedure. In particular, it 
may be possible to recover the optimality results for the inference problems 
treated in Rufibach and Walther (2009) with the univariate version of the 
algorithm introduced here. 

4. Algorithm. It is helpful to use the following notation in this section. 
The coordinates of the A'' locations are (Xi, Yi), . . . , (X^r, IV), and we write 
X(^) := -^(round(r)AAf) fo^' ^eal r, where X^y^ < • • • < -'^(at) are the order statis- 
tics of Xi, . . . ^Xf^. Each location has a label that is either or 1. Here, is 
the pseudo-code to enumerate the set of approximating rectangles and to 
compute the corresponding local test statistics: 

Sort the locations (Xi, Yi), . . . , {Xn, Yn) according to the X-value. 
for £ = 3, ... , [log2 2ib|iv J do: 
Set s:=2-^ e:=rV2/6. 
for i = 0, . . . , £ do: 

for j = 0, . . . , [(es2*)^ij do: 
for A; = j + l,...,j+ [ij do: 

Extract all locations [Xp,Yp) for which Xp falls in the interval 
Xjk ■■= [XQss2^N+i),X{kes2-N)] and denote by 
Njk the number of these locations. 

Sort these extracted Yp and compute the vector of cumulative 
sums of the labels of the {Xp,Yp) corresponding to the sorted Yp. 
for m = 0,..., [27eJ do: 

for n = m + 1, . . . , m -|- [2/eJ do: 

Compute the test statistic on the rectangle 

^jk X [y{me2-^N,k+iyy{ne2-^N,k)], where the order 

statistics Y(.) are with respect to the extracted Yp. 

The running time of the algorithm is almost linear in N. 

Proposition 1. The above algorithm runs in 0{Nlog^N) time. 

We illustrate the methodology with an example where 1000 locations are 
drawn from a mixture of four bivariate normals. The labels are Bernoulli 
with p = 0.4, except in the strip [x > 5], where p = 0.6, and in the box 
[1,2] X [3, 5] , where p = 0.75. Critical values for the conventional scan statistic 
(1) and the blocked scan statistic (3) were computed with 50000 random 
permutations of the labels. Figure 2 (left) shows all minimal (with respect 
to inclusion) boxes that are significant at the 5% level using the calibration 
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Fig. 2. Minimal significant rectangles obtained with the conventional calibration for the 
scan statistic (left) and the blocked scan statistic (right). Locations having label 1 are 
plotted in red, locations with label are black. 

for the traditional scan statistic. Thus, we are 95% confident that each of 
the depicted boxes contains a so-called overdensity, that is, somewhere inside 
the box the probability of success p is larger than outside the box. Figure 
2 (right) shows the resulting significant boxes when the calibration for the 
blocked scan statistic is used. In addition to detecting the small box at 
[1,2] X [3,5], the blocked scan statistic also detects the large box [a; > 5]. It 
was found that this result was a frequent outcome for realizations of this 
example. 

5. Proofs. 

Proof of Theorem 1. We parametrize rectangles as follows: R = 
{x,x' ,y,y') denotes the rectangle with the vertices {x,y),{x' ,y),{x' ,y') and 
{x,y'), where x,x',y,y' G [— oo,c)o] and x < x' , y < y' . We will approximate 
7?.(s) by the finite set 7?.app(s, e), which for notational simplicity we define for 
eG (0,i)by7?.app(s,6e) := {R: R= {x„Xk,ym.,yn) := [F^HjesT], F^\kesT), 
-Fy^(me2~*), Fy^(ne2~*)), where i,j,k,m and n are integers with 0<i< 
[loga^l, < J < [(es2*)-ij, j + 1 < k < j + [1/eJ, < m < [27eJ and 
m + l<n<'m + [2/eJ }. Here, F^^ and Fy^ denote the quantile functions 
of the first and second marginals of F, respectively, with F^^{p) = — oo for 
p <0 and F^^{p) = oo for p > 1. 

Thus, 

riog2(i/^)i 

i=0 
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^ 2,^ ' 6 

i=0 

<4s-ie-4log2(4/s). 

Now, let -R = {x,x'y,y') € 72.(s). We will show that there exists a R' G 
T^a.pp{s, 6e) with F{R A R') < 6es and that one can even arrange that R' C 

R. To this end set i := [logs ^^^^^%^] . Thus, 
(4) s2'"i <Fx([x,x']) <s2^ 

so the index i is assigned to rectangles whose "length," as measured by Fx, 
lies between |2* and s2*. Let j be the smallest integer such that Xj > x, 
let k be the largest integer so that x^ < x' , let m be the smallest integer 
with > y, and let n be the largest integer such that y„ < y' . It will be 
shown below that these indices fall in the ranges given in the definition 
of 7?.app(s,6e), hence R' := ixj,Xk,ym,yn) G 7?.app(s, 6e), and by definition 
R' C R. Further, 

F{R A R') = F{[x,x'] X {[y,y^]U[yn,y'])) 

(5) 

+ F{{[x,Xj]Ll[xk,x']) X [ym,yn])- 

Now, -Fy ([y, T/m]) < e2~* by the definition of m, and the same bound applies 
to FY{[yn,y'])- Likewise, both Fx{[x,Xj]) and Fx{[xk,x']) are not larger 
than es2*. Hence, (5) is not larger than 

2Fx{[x,x'])e2-' + 2es2'FY{[y,y']) 

= 2e{z + sF{R)/z) where z := x'])2~^ G (s/2, s] by (4) 

<2e{s/2 + 2F{R)) since s/2 < F{R) 

<6eF{R). 

It remains to show that i,j,k,m and n fall in the ranges given in the def- 
inition of 7?.app(s, 6e): 1 > Fx([x, x']) > > s/2 implies < i < [logs ■ 
Clearly, j > 0. For ] := [{es2')-'^\ , we have x-- > F^^(l - es2') > F^^{1 - 
2eFx{[x,x'])) > X by (4) and as e < 1/6. Hence, j < j. Next, 



{k-j)es2' = Fx{[xj,Xk]) 



<Fx{[x,x'])<s2\ 
>Fx{[x,x'])-2es2'>0, 



by (4) and since e < 1/6. Hence, I < k — j < 1/e. Clearly, m > 0. Fur- 
ther, s/2<F{R) < {Fx{[xj,Xk])+2esT) x (Fy {[y^, oo)) + e2-') = {k - j + 
2)es2*(l - (m - l)e2"*). Together with {k - j + 2)e < 3 (see above), this 
inequality yields 6(m— l)e < 6 x 2* — 1, whence m <2^/e — l/(6e) + 1 < 2Ye 
since e < 1/6. Finally, s/2 < F{R) < (Fx([xj,Xfc]) + 2es2*) x (Fy + 
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2e2-'). With (4) this yields Fy([y^,y„]) > 2^*" V(l + 2e) - e2"^+^ > since 
e < 1/6, hence n> m. Likewise, s > F{R) > Fx{[xj,Xk]) x -Fy([y-m,yn]) to- 
gether with (4) gives s > s2^~^{n — m)e2~*, hence n — m <2/e. □ 



Proof of Corollary 1. Define the random collection of rectangles as 
in (2), where T^-app.Afls, e) is defined as 7?.app(s,e) in the proof of Theorem 1 
but with F^^ and Fy^ replaced by the empirical quantile functions -Fjy x ^^"^ 
F^\r, respectively. That is, we consider rectangles R = {xj,Xk,ym.,yn) such 
that {xj,Xj^i] has empirical measure F/v,x about equal to §•52*, likewise for 
{xk,Xk+i]; further {ym,ym+i] and {yn,yn+i] have empirical measure F^^y 
about equal to |2~*, where <i < [log2 ^] ■ Then result 2 of Theorem 1 
yields 

Llog2(JV/(21og7V))J 

N / iV ^ ^ 



< 2CiVlog^iV 



Now, let R = {x, x' , y, y') be a rectangle parametrized as in the proof of The- 
orem 1. Set £ := [log2 F(ji) \ ■ Then 3 < ^ < [log2 2THg7V-l assumptions 
on R. We will construct another deterministic rectangle R = {x, x' , y, y') such 
that for 7 = 1/8: 

(A) F(R\R)<(l + -f)-, — ^ ; 

(B) P(there exists R' G 7?.app,jv(2~^ ^ ^Z^) -.RcR' CR)>1- 16 



The claim of the corollary then follows. 

To keep familiar notation set s := 2~^ and e := Define 

■ Fxi[x,x']) ' 

log2 



SO 



(6) 2-1 < < 2^ 

Then < i < [log2 ^] as required in the definition of '72-app,7v(s, e). 

To construct R, define x,x',y,y' such that Fx{[x,x]) = Fxi[x' ,x']) = (1 + 
7)es2V6, Fy([y,y]) = Fy([y',y']) = (l + 7)e2-V6. Then a; < x < x' < x' and 
y < y < y' < y': hy (6), (l + 7)es2Y6 < Fx([x,x'])/2. Likewise, the definition 
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FY([y,y']) 



of i implies s = 2-'< 2F{R), hence (1 + -i)e2~' < jp^^^ = < 
F'Yi[y,y ]) ^ -^j^jg}^ yields y <y <y' <y' . Thus, Rc R and 

F{R\R)=F{[x,x'] X {[y,y]u[y',y'])) 

+ F{{[x,x]iJ[3:',x']) X [y,y']) 
< (1 + ^)eF{R) as in the proof of Theorem 1, 
using s/2 < F{R) by the definition of i. This establishes (A). Next, 

at most —s2^N observations in \x,x 
6 ^ ^ 

= ¥{FN,xi[x,x])<es2') 

= ¥{FN,xi[x,x]) - Fxi[x,x]) < -7£s2V6) 

1+7 

- AT 2 oi/a Chebyshev 
1 + 7 

< 3- ' 



7V(log2)(logiV) 

ogTN 
N 



since s2* > Fx{[x,x']) > F{R) > 2^ by (6) and £ < loggiV. Hence, with 
probability at least 1 — 4 ^^^"|^^^ an endpoint xj of one the rectangles in 

^app,7v(2"^^"^/2) falls into Analogously, one can show that some Xk, 

ym and yn fall into [y,y] and [y',y'], respectively. Hence, there exists 

a rectangle R' G 72.app,Ar(2-^^-^/2) ^i^^^^ satisfies R C R' C R. (B) fohows. 
□ 

Proof of Theorem 2(a). To simplify notation we write {F, R,p, q) for 
{F^ ,Rx, Pn, Qn)- It will become clear that these parameters may vary with 
N due to the uniformity of the following results. As usual. Fx will denote 
the empirical measure pertaining to F. 



Set 67V := Y log jy^ — )• 00. (a) follows after showing: 
(A) there exists a rectangle R' € TZ^pp^N with R' C R and 



T{R') > log -i- + . bx log ) ^ 1. 



F{R) y ^ F{R) 
(B) i?' belongs to a block ^ whose critical value satisfies 
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with 7 depending on a only. 

For the proof of (A), one verifies that the condition D{F{R),p,q) > (2 + 

Eat) log -p^/A^ together with the inequality < 1 for g < p implies 

F{R) > So the block index i := [logg -p^\ of R satisfies 3<£< 

[log, 

2i(^Af J ■ Thus, by Corollary 1, with probability converging to 1 there 
exists R' G 7?.app,Ar such that R' C R and 

F{R') > F{R) (l ^ =] 

>F{R){1-Xn), 



where Xn := | y bM+[iog'2i^/F(R))\ ' ^^^^ inequality follows from 3(1 + 

Llog2 F[fi) J /^Af ) < (1 + Llog2 pp?) J ^ F Llog2 Fpj) J ^ ^^^S^ enough, 
using [log2 > 3. 

Denote by j5 and q the proportion of I's in R' and R"^, respectively. On the 

event ^at .- i f{R){i-F{R)) - ^ ~ 24 '^^Af' F9^-'---r'p--^~^'T=?- 
1 — ^} we have q < p as q < p. The function /(p) := plog | + (1 — p) log 

satisfies ^(p) = 0, l'{p) = 0, and = - 0"^ > - ^)"^ for ^ G 

[g,p]. Thus, Taylor's theorem gives on 

T{R') = {#R')lip) + {N-#R')liq) 



2p{l-q) ' '2p{l-q) 

{#R'){N-i^R') {p-qf 
N 2p{l - q) 



>[l-^^^\M]ND{F{R),p,q)/2 




> log ^ + ^ ( ( 1 - |a^) bM - f Aa. Jlog 




- + ^^^"^ " 14076^)^log since A^v < 2/3. 

(A) follows once we show that '^^{An) — )• 1. As for the first event in 
An-, the proof of Corollary 1 provided a deterministic rectangle R with 
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F{R) > (1 - Xn)F{R) > |logiV/iV and P7v(^ C R' C R) 
inequality gives for c= 1/24: 

Fn{R) 
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1. Chebyshev's 



N 



F{R) 



< 



F{R) 



N{F{R)Yc^X^ 



< 



N 



2{log N)X%c^ 



< 



27 



TV- 

by (8) 



4c2 min(67v log 2, log N) 

and the same bound holds for |^^^ - 1|. But ^ C i?' C R, Fn{R)/F{R) > 
1 - AAr/24 and Fn{R)/F{R) < 1 + AAr/24 imply 



> 



(7) 



Fn{R) F{R) 
Fn{R') I ^ F{R) F{R) 

HR) I <M^<i.^ 

- F{R) - 24' 



Xn 



24 



25 



> li--^]{i-X^)>i-—X^ 



24 



This entails the first event in An due to the inequality > min(^, 2 — ^) 

for x,ye (0,1/2). 

For the other events in An, note that given the locations X = {Xi, . . . , Xn), 
p and q are independent with p ^ hm{^R' ,p)/^R' , while q is an average 
of — ^R' independent Bernoulli random variables that have probability 
of success equal to p for the i^R — ^R' locations in R \ R' and q for the 
N — ^R locations in R'^. Hence, 



IE7v(g|X) 



{#R-#R')p + [N-#R)q 



+ 



N-#R' 
Fn{R)-Fn{R') 



1-Fn{R') 

13/(12 •8)AAr 



(p-q) 



1 -37/(8-36) 
3 



(p-q) 



on (7) as Xn<1,F{R)<1 
o o 



<Q+-^XN{p-q)- 



Thus, on (7) 
'p- 



Fn 



< 1 



p-q 



Xn 
6 



X 



<FN{p-q-EN{p-q)< [^-l]XN{p-q) 



X 
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^ p(l - q)/i^R!+p{l - q)/{N - #R') 

, 2\2 f ^2 asq<p 

1142 F{R){l-F{R)) 



X%ND{F{R),p,q) Fn{R'){1 - Fn{R')) 



1142 25 



- A^(2 + &;v(log(l/F(i?)))-V2)iog(l/F(i?)) ■ V 24^^ 

9 • 1142 • 2 2 

< >0 by (9) and as Xn < -■ 

On 3 

The other two events in An obtain similarly. Above, we used the following 
properties of Aat: 

(8) A^logiV>|min(67vlog2,logA^), 

For proof of those inequalities, note that loga pfj^ > | • bM+ioliluFlR)) - 
|min(67v,log2 pf]^)- Now (8) follows since F{R) > 2logN/N implies logiV/ 
log2 > log 2. Applying the above inequality to the LHS of (9), yields 



the lower bound | min(6Ar2/log2(e), ftA? ylog -pr^) > ^In- 

For the proof of (B), note that by the construction of TZ^pp^N in the 
proof of Corollary 1, the rectangle R' belongs to 7?-app,Ar(2~^, ^~^/2) where 
£ := [log2 . Hence, result 2 of Theorem 1 yields #7^app,iv(2~^^~^/^) < 
^ F{R) (^QS2 F{R) )^ some universal constant K. Thus, 

max T(fl)>log— ^+61oglog— ^ + 7 

i?e'Rapp,^(2-*,£-i/2) F{R) F[R) 

K / 1 
(10) <^-^ log2 



F{R) V F{R) 
x_ max Po(r(i?) >log-^ + 61oglog— ^+7 

where Pq denotes the null distribution, i.e., the permutation distribution, 
conditional on the N locations and on p = total no. of 1 s _ Thus, under Pq, 
the number of I's in a rectangle R follows the hypergeometric distribution 
where ^R labels are drawn out of N, of which pN are I's. Theorem 4 implies 

P„(T(i;)>log^+61oglog^+,) 
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xi^(i?)(log-^) exp(-7) 



Ktt^ ' \ F{R) 

for 7 large enough, depending on a only. Thus, (10) is not larger than ^ x 
(log2 < -^T^ by the definition of I. Now, (B) follows once it is shown 

that 

(11) Pof, max T{R)>q,{^\] > 



But by the definition of qe{-), the probability in (3) is not larger than 
^£>iO(/£'^ — o'?r2/6, hence a > 6a/7r2 by the definition of a. Now (11) fol- 
lows from the definition of q£{-)- □ 



Proof of Theorem 2(bl, b2). The idea of the proof of (bl) is classical, 
see, e.g., Lepski and Tsybakov (2000). Given the prescribed sequence of 
values {F^ {Rj\[)} and {e^}, partition R2 into rectangles Ri , • • • , such 
that F^{Rf) = F^{Rn) for j = 1, . . . , [pNfnj^] ■ This is feasible since F^ 
is continuous, e.g., by partitioning one axis into intervals. Set q = :=l/2 



and p=PN ■= 1+ y 2NF{R){i-F{R)) ^og^XRy^l ~^^/^)' "^here for notational 
simplicity, we write F{R) for F^(i?jv) and also drop the index N from 
F'^ ,Pn,Qn, Rf in the following. Without loss of generality, we may assume 
En < 8. Thus, q<p and for j = 1, . . . , [y^J : 

D{F{Rj),p,q) 

^ log{l/F{R)){l-en/8f 
Np 

^ log{l/F{R)){l-ej4)/N 

- l/2 + (log(l/F(i2)))-i/2 y ^ ) 

,(2-../2)fl-2flog ' ^-^^^™^)) 



> (2 - eTv 



' F{R) J J N 

log(l/F(i?)) 
N 
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for large enough, as en y^og ~^ used 

(12) NFiR){l - F{R)) > (1 + oil)) log2 N > {1 + o(l)) log^ 



which is a consequence of the assumptions on F{R) stated in the theorem. 

Denote by Xi the location and by Yi the Bernoulli random variable that 
gives the label of the ith observation, i = 1, . . . , N . Denote by Pat^q the model 
where the Xi are i.i.d. F and all the Yi have probability of success q, while 
we define ^N,j to be the model where instead Yi has probability of success 
p Xi £ Rj and q otherwise, i = l,. . . ,N. Thus, Pat^q belongs to i^o- Define 
the likelihood ratio Lj^j(K,Y) '■=Y\i=i fN,j{Xi,Yi), where 



fN,j{Xi,Yi) := < q 



1) + 



1 



l{Yi = ^), 



if Xi G Rj , 
otherwise, 



J = 1,..., [-p^J • Hence, if (/)Ar(X,Y) is any level a test that depends on 
the locations X and the labels Y, then by conditioning on X one verifies 
EArj>Ar(X,Y) =E7v,o07v(X,Y)LArj(X,Y). We win show that 



(13) 
Then 



E 



N,0 



1 



F{R) 



ll/F(R)\ 

L;vj(X,Y)-l 



min Km j6Ar(X.,Y) 
j=l,...,ll/F{R)\ 



a 



< 



F{R) 



[l/F{R)i 

E 



EN,j<pN{^,Y) - a 



:E 



N,0 



F{R)j 



L1/F(R)J 

^ L;v,,(X,Y) 



1 U^(X,Y)+o(l) 



< E 



Af,0 



F{R) 



L1/F(R)J 

E 



Liv,i(X,Y)-l 



+ 0(1) 



0(1) 



and the claim of (bl) follows. [Note that one can even allow (/)Ar(X,Y) to 
depend on F^{R]\f). Further, the continuity assumption on F^ was only 
used to allow for the above partition of into rectangles and can be relaxed 
accordingly.] To prove (13), note that conditional on X the L]yj(X.,Y) are 
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independent since Lnj(X.,Y) is a function of only those Yi for which Xi £ 
Rj. Further, one verifies K]\ffiL]\fj{'X.,Y) = EAr^o(-^7Vj(X, Y)|X) = 1. Thus, 
we can proceed similarly as in the proof of Lemma 7.4 in Diimbgen and 
Walther (2008) and obtain (13) once we show that 



max 



(14) 



P 



1 — p 



< C lo: 



1 



-1/2 



m). 

for some constant C, 



(15) .log 



F{R) 



21ogLl/F(i?)J 



oo. 



Now |2-1| 



V 2NF{R){l-Fm log ^ 2(log LfM^ ) '^''^y 



(12) and the same bound obtains for — 1|, proving (14). Finally, 

EN,o{fN,l{Xi,Yi)-lf 
= ^N,0 I 



^i(y, = i) + ^i(y, = o)-i 



1 



XiGRi 



X P7V,o(^l G Rl) 
2log{l/ F{R)) 
NF{R){1- F{R)) 



{l-eN/8fF{R). 



log a: 



< 1 + 



Together with ^ 
(15) is not smaller than 



Tog[x] 



for X >2 one sees that the expression in 



'log 



F{R)] 



:i + l/logLl/F(i?)J)(l 



/8)' 



> Wlo, 




1-F{R) 
1 

\og[l/ F{R)\ 



FiR) 



■ oo 



as F{K) — )• and ^Xog^^ ~^ °o, completing the proof of (bl). The 

bounds on F^ {Rj\[) and given in the statement of (b2) guarantee that 
there exists pN and qn such that D{F^ {RN),pj\[,qN) > b^/N, e.g., take 
Pn = 1, = 0- Then the claim obtains with a contiguity argument similar 
as in the proof of Theorem 4.1(c) in Diimbgen and Walther (2008). □ 
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Proof of Theorem 3. Part (a) continues to hold as intervals on the 
axes are special cases of axis-parallel rectangles. Parts (bl) and (b2) continue 
to hold as their proofs do not depend on the dimensionality of the space. In 
fact, the proof of (bl) already uses a univariate partitioning of one axis into 
[j^Njj^l intervals, and the rest of the proof of (bl) goes through verbatim. 
□ 



Proof of Theorem 4. Let > be integers with x + k < min(n,i2). 
Then 



¥{X = x + k) 



n 



(R — X — k + i){n — X — k + i) 



F{X = x) jLl (a; + i)(iv-i?-n + x + i) 
is nonincreasing in x. Hence, for x > \m\ 

\X = x + k) 



¥{X > x) 



< 



< 



¥{X-- 


= x) 


F{X = 


[m] ) 


¥{X-- 


= x) 


F{X = 


[m] ) 


¥{X-- 


= x) 


F{X = 


[m] ) 



fc>0 ^ 



The connection between this hypergeometric probability and the log like- 
lihood ratio statistic L obtains by applying Stirling's formula and collecting 
terms: the upper and lower bounds for Stirling's formula in Feller [(1968), 
page 54] yield 

F{X = x) 



log 



Fix = \m\ ) 

< -L{x) + L{\m\) 

1^ \m]{R- \m]){n- \m]){N - R-n+\m]) 
2 °^ x{R-x){n-x){N-R-n + x) 



+ 



1 



( 



1 



+ 



1 



12p{l—p)\n N — nJ 



Set L{n,p) := n{plog | + (1 - j5) log jEf). Using log ^ < ^ for < a < 6 



b—a 



and Taylor's theorem, respectively, one finds 

(p-p)2 

[p - pf 



n 



n 



p{l -p) 



>L{n,p)> 



n- 



2{l -pY 

{P - P? 
2p ' 



if P > 
ii p<p, 
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which impUes L([m]) < f^fzfji^ + n^) and for p G {p, 

(16) ^("'^) + 1^4^ 
[distinguish the cases p^ (3 +]j)/4], as well as for q G [ jyj_^ ,p): 

(17) L{N-n,q) + l>^. 

First, consider the case [m] <x< min(n,i?). Then (17) gives 
[ml (R — [ml ) R — m p . ^ , 
x[K — x) R — x q 

and analogously (16) implies 

The first inequality of the theorem now follows from the arithmetic- 
geometric means inequality. 

The case x = min(n, R) is treated similarly. For example, if x = n < R 
then logP(X >x) = logP(X = x) < -L{X) + i log(§^) + ^(i + ^) 

and (17) gives ~ I — 4i(-/V — n,q) + 4, which yields the claimed 

inequality. 

The second inequality of the theorem obtains analogously. The third in- 
equality follows from the first two because the function x — )■ L{x) is strictly 
decreasing for x <m and strictly increasing for x > m. □ 

Proof of Proposition 1. Sorting the data according to the X-coordinate 
requires O(A^logA^) steps. Note that the test statistic inside the n-loop 
can be computed in constant time: the rectangle Xjk x [y((i), Y((,)] contains 
round(6) — round(a) + 1 locations, and the number of their labels that equal 
1 is just the cumulative sum vector of the labels evaluated at index round(6) 
minus the vector evaluated at round(a — 1). These two quantities are suf- 
ficient to calculate the test statistic once the overall number of locations 
N and the overall number of I's is known. Thus, there are 0(l/e) steps 
for the n-loop, and hence 0(2*/e^) for the m-loop. Inside the fc-loop the 
number of steps required to extract the Njk locations {Xp,Yp), to sort the 
corresponding ^-values, and to compute the cumulative sum is dominated 
by the sorting, which requires 0{Njf;logNjig) steps. (Note that presorting 
the locations according to their X-coordinate allows an efficient extraction.) 

Thus, the total number of steps in the algorithm is bounded by 

log2(iV/(21og7V)) e il/22i2-i 

0{NlogN)+ Yl EE ^0{N^^\ogN,k) + 0{2^£)). 

1=3 i=0 j=0 k=j+l 
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By definition, A'^^ < {k — j)es2^N < 2* ^N. Thus, the above sum is not larger 
than 

0(iV log iV) 

log2(Af/{21ogAr)) £ 
+ C ^^2V'(2''-^iVlogiV + 2^^) 

i=S i=0 

log2(iV/(21ogiV)) 

<CNlogN f 

<CiV(logiV)^ 
where the constant C may change from line to line. □ 
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